Conversation
Summary of ChangesHello @Yunnglin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the server's internal state management, transitioning from a single, complex Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and beneficial refactoring of the server state management by breaking down the monolithic ServerState class into smaller, more focused manager classes. This greatly improves modularity and maintainability. The adapter/model limit has been moved from a per-model configuration to a global server setting, enforced within the new ModelManager, which aligns with the PR's objective. While the refactoring is well-executed, I've identified a few areas for improvement: a potentially incomplete implementation of the new limit for the Twinkle protocol, some code duplication in the new state manager, and a commented-out service in a configuration file that should be cleaned up.
There was a problem hiding this comment.
Pull request overview
This PR refactors the server state management system to centralize per-token model limits at the ServerState level rather than enforcing them at the adapter level. The primary goal is to fix a bug where adapter limits were checked too late in the model creation flow, allowing users to exceed the limit under certain conditions.
Changes:
- Refactored single-file
state.pyinto a modular package structure (state/) with separate managers for sessions, models, sampling sessions, futures, and configuration - Moved per-token limit enforcement from
AdapterManagertoModelManagerwithinServerState, checking the limit during model registration rather than adapter creation - Updated model creation flow in
tinker/model.pyto register models inside the_create_adaptercoroutine, ensuring atomicity and proper error handling - Updated YAML configuration files to specify
per_token_model_limitinserver_configinstead ofper_token_adapter_limitinadapter_config
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/twinkle/server/utils/state.py | Deleted monolithic state file |
| src/twinkle/server/utils/state/base.py | New abstract base manager class with common CRUD and cleanup operations |
| src/twinkle/server/utils/state/models.py | Pydantic models for session, model, sampling session, and future records |
| src/twinkle/server/utils/state/session_manager.py | Session lifecycle management with heartbeat tracking |
| src/twinkle/server/utils/state/model_manager.py | Model registration with per-token limit enforcement |
| src/twinkle/server/utils/state/sampling_manager.py | Sampling session lifecycle management |
| src/twinkle/server/utils/state/future_manager.py | Async task future/request status tracking |
| src/twinkle/server/utils/state/config_manager.py | Key-value configuration storage |
| src/twinkle/server/utils/state/server_state.py | Unified ServerState class composing all managers, plus Ray proxy and factory |
| src/twinkle/server/utils/state/init.py | Package exports |
| src/twinkle/server/utils/adapter_manager.py | Removed per_token_adapter_limit parameter and check_adapter_limit method |
| src/twinkle/server/twinkle/model.py | Updated comment to reflect that limit check no longer happens in register_adapter |
| src/twinkle/server/tinker/server.py | Removed unused dataclasses import |
| src/twinkle/server/tinker/model.py | Moved model registration inside _create_adapter coroutine, added null check before cleanup, removed model_id from schedule_task call, updated comment |
| cookbook/client/twinkle/transformer/server_config.yaml | Added server_config.per_token_model_limit: 3, removed adapter_config.per_token_adapter_limit |
| cookbook/client/twinkle/megatron/server_config.yaml | Added server_config.per_token_model_limit: 3, removed adapter_config.per_token_adapter_limit |
| cookbook/client/tinker/transformer/server_config.yaml | Added server_config.per_token_model_limit: 3, removed adapter_config.per_token_adapter_limit |
| cookbook/client/tinker/self_congnition.py | Removed unused numpy import |
| cookbook/client/tinker/megatron/server_config_7b.yaml | Added server_config.per_token_model_limit: 1, removed adapter_config.per_token_adapter_limit, commented out sampler service |
| cookbook/client/tinker/megatron/server_config.yaml | Added server_config.per_token_model_limit: 3, removed adapter_config.per_token_adapter_limit from model and sampler |
PR type
PR information
Write the detail information belongs to this PR.
Experiment results
Paste your experiment result here(if needed).